An open-source shallow-transfer machine translation toolbox: consequences of its release and availability
نویسندگان
چکیده
By the time Machine Translation Summit X is held in September 2005, our group will have released an open-source machine translation toolbox as part of a large government-funded project involving four universities and three linguistic technology companies from Spain. The machine translation toolbox, which will most likely be released under a GPL-like license includes (a) the open-source engine itself, a modular shallowtransfer machine translation engine suitable for related languages and largely based upon that of systems we have already developed, such as interNOSTRUM for Spanish—Catalan and Traductor Universia for Spanish—Portuguese, (b) extensive documentation (including document type declarations) specifying the XML format of all linguistic (dictionaries, rules) and document format management files, (c) compilers converting these data into the highspeed (tens of thousands of words a second) format used by the engine, and (d) pilot linguistic data for Spanish—Catalan and Spanish—Galician and format management specifications for the HTML, RTF and plain text formats. After describing very briefly this toolbox, this paper aims at exploring possible consequences of the availability of this architecture, including the community-driven development of machine translation systems for languages lacking this kind of linguistic technology.
منابع مشابه
Open-Source Portuguese-Spanish Machine Translation
This paper describes the current status of development of an open-source shallow-transfer machine translation (MT) system for the [European] Portuguese ↔ Spanish language pair, developed using the OpenTrad Apertium MT toolbox (www.apertium.org). Apertium uses finite-state transducers for lexical processing, hidden Markov models for part-of-speech tagging, and finite-state-based chunking for str...
متن کاملRuLearn: an Open-source Toolkit for the Automatic Inference of Shallow-transfer Rules for Machine Translation
This paper presents ruLearn, an open-source toolkit for the automatic inference of rules for shallow-transfer machine translation from scarce parallel corpora and morphological dictionaries. ruLearn will make rule-based machine translation a very appealing alternative for under-resourced language pairs because it avoids the need for human experts to handcraft transfer rules and requires, in con...
متن کاملScaleMT: a Free/Open-Source Framework for Building Scalable Machine Translation Web Services
Machine translation web services usage is growing amazingly mainly because of the translation quality and reliability of the service provided by theGoogle Ajax LanguageAPI. To allow the open-source machine translation projects to compete with Google’s one and gain visibility on the internet, we have developed ScaleMT: a free/open-source framework that exposes existing machine translation engine...
متن کاملSharing resources between free/open-source rule-based machine translation systems: Grammatical Framework and Apertium
In this paper, we describe two methods developed for sharing linguistic data between two free and open source rule based machine translation systems: Apertium, a shallow-transfer system; and Grammatical Framework (GF), which performs a deeper syntactic transfer. In the first method, we describe the conversion of lexical data from Apertium to GF, while in the second one we automatically extract ...
متن کاملAn Open-Source Shallow-Transfer Machine Translation Engine for the Romance Languages of Spain
We present the current status of development of an open-source shallow-transfer machine translation engine for the Romance languages of Spain (the main ones being Spanish, Catalan and Galician) as part of a larger government-funded project which includes non-Romance languages such as Basque and involving both universities and linguistic technology companies. The machine translation architecture...
متن کامل